Journal Club

Jay Brophy MD PhD

Departments of Medicine, Epidemiology and Biostatistics, McGill University

2025-05-21

Stride Trial Overview

STRIDE was a double-blind, randomised, placebo-controlled trial done at 112 outpatient clinical trial sites in 20 countries in North America, Asia, and Europe. Participants were aged 18 years and older, with type 2 diabetes and peripheral artery disease with intermittent claudication. 

Participants were randomly assigned (1:1) using an interactive web response system to receive subcutaneous semaglutide 1·0 mg once per week for 52 weeks or placebo.  

The primary endpoint was the ratio to baseline of the maximum walking distance at week 52 measured on a constant load treadmill in the full analysis set.

Why Change Scores Can Be Misleading

  • Regression to the mean occurs when participants with unusually low (or high) baseline values tend to shift toward the average on repeated measurements — even without any intervention.
  • If the treatment group has low observed baseline values due to measurement noise, their follow-up scores may appear to improve more than they actually did.
  • Altman (2001)(Altman 2001) and others have shown that comparing change scores or ratios to baseline across groups** introduces bias when baseline values are imbalanced or noisy (Altman 2001).
  • The correct approach is ANCOVA: model the follow-up outcome and adjust for the baseline as a covariate.
  • This preserves randomization and avoids overestimating treatment effects.

Simulating Regression to the Mean that Inflates the Effect

  • The following simulation mimics the STRIDE trial (Authors 2024), which used a 6-minute walk test to measure the effect of semaglutide on walking distance 
  • The true treatment effect is 10 m 
  • The observed baseline is biased due to measurement error 
  • The observed follow-up is the true baseline plus the treatment effect plus noise 
  • The naive analysis (change score) will overestimate the treatment effect 
  • The correct analysis (ANCOVA) will adjust for the baseline and provide a more accurate estimate of the treatment effect 
set.seed(2027)
n <- 396
baseline_true <- rnorm(2 * n, mean = 185, sd = 20)
baseline_obs <- c(
  baseline_true[1:n] + rnorm(n, mean = -10, sd = 20),
  baseline_true[(n+1):(2*n)] + rnorm(n, mean = 0, sd = 20)
)
group <- rep(c("Semaglutide", "Placebo"), each = n)
treat <- ifelse(group == "Semaglutide", 1, 0)
true_effect <- 10
followup <- baseline_true + true_effect * treat + rnorm(2 * n, mean = 0, sd = 20)
change <- followup - baseline_obs
df <- tibble(group, treat, baseline_true, baseline_obs, followup, change)

Published analyses

Naive Analysis (Wrong)  

[1] 19.62093
[1] 1.727469e-20

Correct Analysis: ANCOVA  


Call:
lm(formula = followup ~ treat + baseline_obs, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-91.273 -16.333   0.104  15.567  91.224 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  101.10181    5.83737  17.320  < 2e-16 ***
treat         13.68387    1.78125   7.682 4.64e-14 ***
baseline_obs   0.46234    0.03083  14.995  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.6 on 789 degrees of freedom
Multiple R-squared:  0.2398,    Adjusted R-squared:  0.2379 
F-statistic: 124.5 on 2 and 789 DF,  p-value: < 2.2e-16

Visualization: Inflation of Treatment Effect

Bayesian ANCOVA

Running MCMC with 4 sequential chains...

Chain 1 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 1 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 1 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 1 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 1 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 1 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 1 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 1 finished in 1.3 seconds.
Chain 2 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 2 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 2 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 2 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 2 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 2 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 2 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 2 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 2 finished in 1.5 seconds.
Chain 3 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 3 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 3 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 3 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 3 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 3 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 3 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 3 finished in 1.5 seconds.
Chain 4 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 4 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 4 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 4 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 4 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 4 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 4 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 4 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 4 finished in 1.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.5 seconds.
Total execution time: 6.5 seconds.

Posterior Summary and Visualization

# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 adj_diff  13.7   13.7  1.79  1.76  10.7  16.6  1.00    4128.    3683.

Selection Bias in STRIDE Trial

  • Table 2 in STRIDE shows only 338/396 (85%) in the semaglutide group and 345/396 (87%) in the placebo group were analyzed for the primary outcome (Authors 2024).
  • This means 15% of patients had missing outcome data.
  • If missingness is not random (e.g., related to tolerability or worsening condition), the observed effect size is biased.
  • This is particularly concerning given the subjective and effort-based nature of the walking test.

Corrected Sensitivity Simulation: Inflated Treatment from Selective Missingness

[1] 4.031132

Final Discussion Slide: Key Take-Home Points

  • Naive change score analyses are vulnerable to regression to the mean, especially when measurement error exists.
  • The STRIDE trial’s analysis likely overestimated the treatment benefit due to this bias.
  • The trial also suffers from ~15% missing data, without clear methods to handle this — risking selection bias.
  • ANCOVA and Bayesian ANCOVA correct for these issues and offer a more reliable estimate of treatment effect.
  • Clinicians should be wary of simple pre-post comparisons and always ask: “Was the analysis adjusted for baseline?”
Altman, DG. 2001. “Why We Need Confidence Intervals.” BMJ 323 (6308): 903–5.
Authors, STRIDE Trial Investigators. 2024. “Semaglutide in Patients with Peripheral Artery Disease and Claudication (STRIDE): A Randomised, Double-Blind, Placebo-Controlled, Phase 2 Trial.” The Lancet. https://doi.org/10.1016/S0140-6736(24)00000-0.